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Abstract 

Song, Havlin and Makse (2005) have recently used a version of the box-counting 
method, called the node-covering method, to quantify the self-similar properties of 
43 cellular networks: the minimal number Ny of boxes of size I needed to cover all 
the nodes of a cellular network was found to scale as the power law Ny ~ (£+l)~ Dv 
with a fractal dimension Dy = 3.53 ±0.26. We propose a new box-counting method 
based on edge-covering, which outperforms the node-covering approach when ap- 
plied to strictly self-similar model networks, such as the Sierpinski network. The 
minimal number Ne of boxes of size I in the edge-covering method is obtained with 
the simulated annealing algorithm. We take into account the possible discrete scale 
symmetry of networks (artifactual and /or real) , which is visualized in terms of log- 
periodic oscillations in the dependence of the logarithm of Ne as a function of the 
logarithm of I. In this way, we are able to remove the bias of the estimator of the 
fractal dimension, existing for finite networks. With this new methodology, we find 
that Ne scales with respect to i as a power law Ne ~ £~ E with De = 2.67 ± 0.15 
for the 43 cellular networks previously analyzed by Song, Havlin and Makse (2005). 
Bootstrap tests suggest that the analyzed cellular networks may have a significant 
log-periodicity qualifying a discrete hierarchy with a scaling ratio close to 2. In 
sum, we propose that our method of edge-covering with simulated annealing and 
log-periodic sampling minimizes the significant bias in the determination of fractal 
dimensions in log-log regressions. 

Key words: Complex networks; cellular networks; self-similarity; fractal 
dimension; discrete scale invar iance; edge covering 
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1 Introduction 

In recent years, the study of complex networks have attracted extensive inter- 
est, covering biological, social, information, and technological systems [1,2,3]. 
Most complex networks exhibit small-world properties [4] and are scale free in 
the sense that the distribution of degrees has power-law tails [5]. Note that, 
as stressed recently by Keller [6], the existence of power law distribution in 
many complex networks is a re-discovery in this field of previously well-known 
mechanisms (see for instance chapters 14 and 15 of [7] which describes many 
mechanisms for power laws which have been described in several often quite 
different scientific contexts). In addition, many real networks have modular 
structures or communities [8] expressing their underlying functional modules. 
The fourth intriguing feature of some (not all) real networks reported recently 
is the self-similarity of the topology [9], characterized by a fractal dimension 
[10], and the scale invariance of degree distribution after coarse graining pro- 
cesses [11]. 

The fractal nature of a self-similar network can be revealed by utilizing the 
well-known box-counting method [10,12]. Song, Havlin, and Makse [9] have 
adopted a node-covering method, in which boxes of size £b = £ + 1 are used 
to cover all the nodes of a network, where I is the diameter of the subnetwork 
enclosed in the box, and the minimum number of node-covering boxes, Nv(£b), 
is determined for each If the network is self-similar, Nv{£b) scales with 
respect to £b as a power law 

N v ~l/(l+l) D - , (1) 
where Dy is the fractal dimension of the network. 

It is natural to raise the question of what is the underlying mechanism for 
a network to evolve into a self-similar structure [13]. An empirical study re- 
ports that two genetic regulatory networks, which are self-similar scale-free 
networks, are not assortative (i.e., cannot be separated into groups accord- 
ing to kind), a result confirmed by numerical simulations [14]. However, this 
claim was falsified by Song, et ai, who showed that the self-similar structure 
in complex networks was due to the repulsion between hubs [15]. In addition, 
self-similar networks have been shown to have the same fractal scaling as thei 
skeleton [16]. An alternative node-covering method was proposed based on the 
skeleton of the network under investigation [17]. 

Another important symmetry associated with scale invariance and fractals is 
discrete scale invariance (DSI) [18], which expresses the property that the frac- 
tal is self-similar only with respect to magnification factors which are integer 
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powers of a preferred scaling ratio A: when a fractal possesses the property of 
DSI, it is self-similar only under magnification with factors A, A 2 , A 3 , A 4 , and 
so on. The observable hallmark of discrete scale invariance is log-periodicity 
in the scaling of the observables as a function of scale or control parameter (s). 
For instance, significant log-periodic oscillations are observed in the degree 
distributions [19] and energy distributions [20] of model networks. However, 
there is no direct evidence, yet, showing the presence of log-periodicity in the 
topological structure of self-similar networks observed in Nature or in social 
structures. 

In this paper, we first study a model network, known as the Sierpinski gas- 
ket in the context of fractals, which is exactly self-similar with build-in log- 
periodicity. We propose a novel box-counting method based on edge-covering 
tiling, which significantly outperforms the node-covering method in the de- 
termination of fractal dimensions. A new method for detecting discrete scale 
invariance is also proposed and tested on the Sierpinski network. We then 
combine these two methods to obtain an unbiased estimator of the fractal 
dimension of self-similar networks. We apply this methodology to the forty 
three cellular networks investigated by Song, et al. [9] and find significant dif- 
ferences. In particular, it seems that previous results have been significantly 
biased upwards. We also suggest the existence of a weak discrete dichotomous 
hierarchical structure in many of the analyzed networks. 



2 A family of exactly self-similar networks 



2. 1 The construction of Sierpinski networks 



Let us first recall how to construct the family of Sierpinski triangle networks. 
The initiator is an equilateral triangle of unit length. Two replicas of the 
initiator are placed near the initiator to form an equilateral triangle, of the 
same form as the generator but with side length twice larger. Starting from 
the initiator of generation g = 1, we thus obtain in the first iteration of the 
construction the generation g = 2 formed of three initiators. The first three 
generations of the iterative construction of Sierpinski networks is depicted in 
Fig. 1. 

It is well-known that Sierpinski networks are self-similar. For a Sierpinski 
network of generation g, the number of nodes is 

3 g + 3 

n 9 = , (2) 

which is obtained of the recursion relation n g +i = 3n g — 3 with n g =\ = 3. The 
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Fig. 1. Construction of Sierpinski networks. A Sierpinski network of generation g is 
denoted A g . The three networks shown in this figure are thus Ai, A2 and A3. 

number of edges is simply 

m g = 3 9 . (3) 

For instance, a Sierpinski network of generation g = 6 has n g = 366 nodes 
and m g = 729 edges. 



2.2 Box-counting method by covering nodes 

We first adapt the node-covering method used by Song, Havlin, and Makse [9] 
to cover the nodes of a Sierpinski network A 9 . Let us start with a box of size 
£b = i = 1. Actually, three geometrical patterns should be considered which 
are of box size I = 1: (i) the triangle initiator Ai, (ii) an edge of unit length 
with two nodes, and (iii) an individual node. We use these three geometrical 
patterns to cover the nodes of Sierpinski networks. Figure 2 shows the obtained 
coverings of A g for g — 2, 3, 4 and 5 with boxes of size £ = 1. For instance, the 
optimal node-covering configuration of A 2 is obtained with just one triangle 
initiator Ai, one edge and one individual node. The optimal node-covering 
configuration of A3 uses three triangle initiators Ai, three edges and zero 
individual nodes. And so on. 

For a given network, its node-covering structure includes A3 triangle initiators 
Ai, N2 edges, and Ai nodes, giving a total number of covering boxes equal 
to Ny = Ai + A2 + A3. Note that the number of nodes is given by n g = 
3A r 3+2A r 2+A r i because the optimal node-covering patterns involve no overlaps. 
For g — 1, we have obviously Ay(l) = 1. For g > 2, each Sierpinski network 
A g can be viewed as made of A2 elements. For each A2 in a given A g , there 
is only one covering box which is a A x . For two adjacent A 2 's, there are no 
additional Ai boxes. Therefore, the maximal number of boxes Ax used in the 
covering is 

= 3 g ~ 2 . (4) 

These A^ Ai boxes cover 3A^g = 3 9_1 nodes. Therefore, the minimal number of 
nodes which are not covered by Ai boxes is n g — 3A"g = — 3 9 " 1 = 3 2 +3 . 
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g=4 



g = 5 



Fig. 2. Iterative algorithm for tiling Sierpinski networks A g for g = 2, 3, 4 and 5 
with boxes of size 1 = 1. The "boxes" are the triangle initiator Ai, an edge of unit 
length with its two nodes (represented by as thick segments in the figure) and an 
individual node (represented as a small filled circle in the figure). 

For this minimum number of still uncovered nodes, the maximum number of 
covering boxes that are edges (which we can refer to as "Min-Max") is 



~n g - 




"3 9 " 1 + 3" 


39-1 _ (-l)f-i 


2 




4 


4 



(5) 



where [x] is the integer part of x. Then, the remaining number of isolated 
nodes to cover with individual node boxes is 



3N' Z - 2N' 2 



[1 



]/2 



It follows immediately that 



N v > N[ + N' 2 + iVg 



7x 3 g ~ 2 + 4 + {-iy 



(6) 



(7) 



In other words, because the total number of covering boxes is minimized by 
using the maximum number of Ai's (since each cover three nodes), the above 
construction shows that N[ + N 2 + N% is the infimum of the minimal number 
Ny of covering boxes. 

For g = 2,3, 4, 5, Fig. 2 provides one possible tiling for each, where the infimum 
N[+N^+Ns is reached, i.e., the number of boxes is exactly [7 x 3 s " 2 + 4 + (-l) 9 ]/4. 
Together with Eq. (7), we synthesize the above results under the single ex- 
pression 



H{g- 1.5) x 



7x 3 9 " 2 + 4 + (-l)f 



+ H{1.5-g) 



where H(x) is the Heaviside function. Our numerical calculations show that 
this equation holds for all g > 5 cases that we have investigated. For the 
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covering of a Sierpinski network A 9 with boxes of size £ > 1, we do not have 
analytic expressions for Ny(£). However, our numerical simulations show that 
Ny(2 k ) = NE{2 k ) for 1 < k < g, where the definition of Ne is given in the 
next sub-section. 



2.3 Box-counting method by covering edges 

Instead of covering nodes as described in the previous sub-section, we propose 
to use boxes in order to cover all the edges. Mathematically, this problem 
can be described as follows. Consider the set E of all the edges of a given 
network. Let {Ei : i = 1, 2, ■ ■ • , Ne(£)} be a partition of E, that is, E$ C E, 
Ei PI Ej = $ (null set) for i ^ j, and E = U^^Ei. The size (or diameter) of 
Ei, denoted by d(Ei), is defined as the maximum distance among all possible 
pairs of nodes of Ei counted as the number of edges to go from one node to the 
other of the pair. For a given £, we construct an edge-covering by partitioning 
the set E into subsets {Ei} such that all their diameters d(Ei) are smaller 
than or equal to i for all i's. Then, the partition {Ei : i = 1, 2, ■ • ■ , Ne{£)} is 
said to be a covering of the network with boxes of size £. We look for the best 
edge-covering structures by minimizing Ne{£), for each i. An illustration of 
the edge-covering method is shown in Fig. 3 for three different values of £. 




1=1 £ = 2 £ = 3 

Fig. 3. (Color online) Illustration of the edge-covering method for three different 
"box" sizes £ = 1, 2, and 3. The nodes of the network are the circles and the 
edges of the network are shown as solid lines. The closed dashed lines delineate the 
sub-sets or boxes of diameter no larger than £. For £ = 1, two subset topologies are 
possible: edges joining two adjacent nodes and a triplet of nodes forming a triangle 
which also has diameter 1. For larger £, arbitrary complicated topologies for the 
sub-set/boxes are possible as long as their diameter remains smaller than or equal 
to £, as shown in the case £ = 2 and £ = 3. The edge-covering shown in the figure 
was obtained by our simulated annealing algorithm which provides in these simple 
cases optimal solutions. 

Note that our algorithm for covering edges also automatically covers the nodes 
(sometimes with some nodes multiply covered). Thus, the minimal number 
Ne{£) of boxes of a given size £ in the edge-covering method is not smaller 
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than Ny(£). Consider a (^-generation Sierpinski network and scales £ = 2 k with 
k — 0, 1, 2, • • • , g — 1. It is well-known that 

N E (2 k ) = 3 9 ^~ k . (9) 

Therefore, we have 

N E (£) = 3 9 - 1 ■ £~ De , (10) 

where D E = hi(3) / hi(2) ~ 1.5850 is the fractal dimension. On the other hand, 
the scale invariance symmetry of the Sierpinski network can be expressed as 

N E (£) = 3N E (2£) . (11) 

The general solution of this functional equation (11) takes the log-periodic 
structure 

N E (£)^£- D ^(\og 2 (£)) , (12) 

where <p(x + 1) = 0(x) is a periodic function with unit period. Here, the 
preferred scaling ratio of the construction of the Sierpinski gasket is A = 
e in(2) _ 2 ; corresponding to a log-frequency / = 1 (more generally when the 
scaling ratio is not 2, we have <p{x + 1//) = 4>{x) and A = e ln< - 2 ^). 



2.4 Simulated annealing for the node-covering and edge-covering methods 

According to the definition of fractal dimensions, we need to determine the 
minimal number of boxes necessary to cover the nodes or the edges of a net- 
work, as a function of the box scale £. In the supplementary materials of ref. [9], 
it was reported that "the minimization (of the number of boxes used to cover 
the network) is not relevant and any covering gives the same exponent" . While 
this may be true in some cases, working with arbitrary covering structures adds 
noise to the scaling laws. In order to get cleaner scaling, we have implemented 
the simulated annealing algorithm [21,22] to obtain covering partitions with a 
minimum number of covering boxes. 

The simulated annealing algorithm is implemented as follows. For the edge- 
covering problem, the network is viewed as a set of edges, which can be par- 
titioned into "boxes" or subsets of connected edges. Starting from a given 
partition with C boxes of sizes no larger than £, we consider three possible 
moves to transform the partition into a new one: 

(1) One edge is moved from one box with at least two edges in it to another 
box if the diameters of both new boxes do not exceed £; 

(2) One edge is moved out of one box with at least two edges to form a new 
box consisting of one edge; and 

(3) Two boxes merge to form a new box, a move which is allowed if the 
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diameter of the resulting box is no larger than £. 



At each temperature T, we perform k\ edge operations of the first-type of 
edge exchanges between pairs of boxes, k 2 operations of the second-type to 
form new boxes, and k% operations of the third-type merging pairs of boxes. 
An operation is accepted with probability 

if C a ^C b 

P={ , „ (13) 




if C n >C h 



where Cb and C a are the numbers of boxes before and after an operation. After 
having performed k\ + k 2 + k% possible operations, the system is cooled down 
to a lower temperature T' = cT, where c is a constant less than and close to 
1 (typically c = 0.995 ~ 0.999). 

For the node-covering method, the simulated annealing procedure is deduced 
from the one described above by applying the three types of operations to 
nodes rather than to edges. Typical values for ki, k 2 and ks are 20000, 5, and 
15. The much larger value of k\ is justified by the fact that the single edge 
transfers should be much more frequent than merging two boxes and splitting 
one boxes into two. If k 2 and k 3 were much larger relative to fci, it would be 
more difficult for the optimization to converge to a minimal box number. This 
is confirmed by our simulations. 



2.5 Numerical comparison of the nodes-covering and edges-covering box-counting 
methods 



We first compare the node-covering method developed in Ref. [9] and our 
edge-covering method by applying them on a Sierpinski network A 6 of sixth 
generation. We have used the simulated annealing algorithm described above 
to search for the minimal numbers Ny(£) and Ne(£) of covering boxes of size 
£ for the two methods. The results are shown in Fig. 4. One can observe that 
AV(1) = 143 predicted by (8) and N E (2 k ) predicted by Eq. (9) are recovered 
by the simulations. Our simulations also confirm that Ny(£) ^ Ne(£), as 
expected from the nature of the two methods. 

Although the values of Ny and Ne are quite close, their difference is such 
that their corresponding estimations of the fractal dimension are significantly 
different. The apparent fractal dimension Dy obtained using the node-covering 
method with equidistant integer values of I gives Dy = 1.30, which is the 
absolute slope of the line of \nN v (£) versus ln£ for £ = 1, 2, 3, • • • , 32 (see 
the dotted-dashed line in Fig. 4). The regression of IiiNe(£) (obtained with 
equidistant integer values of £) as a function of ln£ for £ = 1, 2, 3, ■ • • , 32, 
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shown as the dashed line, gives De = 1-37. Both values are significantly biased 
downward compared with the exact analytical value Df — 1.5850. However, 
De = 1-37 is a better estimate of Df. These biases stem from the discrete- 
scale invariant structure of the Sierpinski network, which leads to geometrically 
increasing plateaus, as can be seen in Fig. 4. The estimated dimension De, 
while also biased, is closer to the exact value, because the Sierpinski network 
is exactly self-similar in terms of edges but only asymptotically self-similar in 
terms of nodes. This suggests that the edge-covering method may give better 
results for finite-sized networks. 




/ 

Fig. 4. Comparison of simulations and theoretical analysis for the node-covering 
and edge-covering methods described in previous sections. The slope of the dot- 
ted-dashed (respectively dashed) line gives the estimation Dy = 1.30 (respectively 
De = 1.37) from the node-covering (respectively edge-covering) method. The exact 
analytical value is Df = 1.5850. 

2.6 Log-periodic oscillations and referred scaling ratio 

The data in Fig. 4 not only exhibits clear scaling behavior. In addition, there 
are log-periodic oscillations decorating the leading power law behavior, as 
expected from the theoretical considerations presented in Section 2.3. We now 
show how this log-periodicity can be extracted empirically from the data. To 
extract the log-frequency / and the preferred scaling ratio A, we first detrend 
Ne by removing the power law to investigate the quantity 

ln^(hi(^)) =\n.N E (£) -D E \n£ , (14) 

where De is estimated as the negative of the slope of the linear regression 
of \d.(Ne) as a function of ln(£). It is convenient to consider the right-side 



9 



continuous function N E {£) for 1 ^ t ^ 32 defined as 

N E {£) = N E {i), if i^£<i+l , (15) 

where i = 1, 2, • • • ,32. The dependence of N E (£) as a function of £ is shown 
in Fig. 5a. The dependence of ln 2 obtained from (14) as a function of ln 2 £ 
is shown in Fig. 5b. 




Fig. 5. Quantification of log-periodicity in the Sierpinski network, a: dependence of 
N E (£) defined by (15) as a function of I. b: dependence of 4> as a function of £ in 
log-log scale, showing strong visual evidence of the log-periodic oscillations, c: Power 
spectrum of the function log 2 [^(log 2 £)] of the variable log 2 £- The log-frequency of 
the main peak is f\ = 0.985 corresponding to the preferred scaling ratio is A = 2.02. 
d: dependence of the log-frequencies fi corresponding to the successive peaks shown 
in panel c as a function of their index i. The linear dependence suggests that the 
succession of peaks in panel c corresponds to the different harmonics of f\. A new 
estimation of f\ is obtained by the regression fi = i X f\, which yields f\ = 1.002 
corresponding to a preferred scaling ratio A = 1.997. 

In order to calculate the power spectrum of the function log 2 [</>(log 2 (£))] of the 
variable log 2 £, we sample on 300 values with regularly spacing in [log 2 (l), log 2 (32)]. 
The resulting power spectrum is shown in Fig. 5c. The log-frequency (in log- 
arithm with base 2) associated with the highest peak is f\ = 0.985 and the 
preferred scaling ratio is therefore A = e^ ln2 ^^ = = 2.021. The height of 
the highest peak is 69.9, which together with the number of data points gives 
a probability of false-positive for log-periodicity essentially 0, at all confidence 
levels [23]. In addition to the main peak in panel c of Fig. 5, many secondary 
peaks can be observed. They are found to be equidistant to a very good ap- 
proximation, suggesting that the corresponding log- frequencies are nothing 
but the harmonics / 2 = 2/i, f% = 3/i, ■ ■ ■ , fi — i x fi, ■ ■ ■ of the fundamental 
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log- frequency fi . The existence of a set of harmonics is known to increase the 
evidence of log-periodicity [24]. In addition, the measurements of the harmonic 
log-frequencies allows us to improve the estimation of the fundamental log- 
frequency fi by using a linear regression of f\ as a function of i [24] , as shown 
in Fig. 5d. The slope of this regression gives / = 1.002 and the corresponding 
preferred scaling ratio is A = 1.997, which are very close to the exact values 
fi = 1 and A = 2 respectively for the Sierpinski network. 



2.7 How to use log-periodicity to improve the estimation of the fractal di- 
mension 



Note that a regression of In N E (£) versus ln£ for equidistant integer values 
of £ = 1, 2, 3, 4, 5, • ■ • ,32 gives a biased estimate of the fractal dimension 
D E = 1.37, which is significantly smaller than the exact value Df = 1.5850. 
In contrast, sampling £ with a uniform logarithmic spacing as in Sec. 2.6 (with 
300 points) gives D E = 1.52, much closer to the exact value. 

More specifically, we stress the general property that a fractal system with 
discrete scale invariance is better sampled using data points at £ = £o\ k , 
where £q is associated with a well-chosen phase of the log-periodicity and 
k = 0, 1, 2, 3 • • ■ , which are integer powers of the underlying discrete scale 
factor A. In this way, the sub-dominant log-periodic corrections to the scaling 
are removed. 

Using this method for sampling N E (£), i.e., regressing lniV^l), lnA^^(2), 
lnjV £ (4), \nN E (8), In N E (16), and \nN E (32) for the Sierpinski network A 6 
as a function of £ = 1, 2, 4, 8, 16, and 32, gives D E = 1.5850 which is indeed 
the exact fractal dimension Df. 

Thus, we decrease the bias in the estimate of the scaling exponent D E when 
going from equidistant integer values of £ to geometrically sampled £. More- 
over, we eliminate completely the bias when the geometrical ratio used in the 
sampling method is equal to the preferred scaling ratio A characterizing the 
discrete scale invariant structure of the network. 



Obviously, this method applies and work for other fractals only if they exhibit 
the symmetry of discrete scale invariance. In our experience, using £q = 1 is a 
good choice. 
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3 Application to cellular networks 

3.1 The data sets 

We use the ERGO (formerly WIT) database, which provides links to informa- 
tion about the functional role of enzymes. More precisely, the ERGO database 
of cellular networks considers the cellular functions divided according to bio- 
engineering principles containing data sets for intermediate metabolism and 
bioenergetics (core metabolism), information pathways, electron transport, 
and transmembrane transport. We revisit the forty three cellular networks, 
which have been studied in a recent analysis developed from the point of view 
of scale-free networks [25]. 



3.2 Log-periodic oscillations in cellular networks 

We consider the same 43 cellular networks analyzed in [25]. We follow the 
procedure described in Sec. 2 to investigate each of these 43 cellular networks. 
Specifically, we implement the edge-covering method with the simulated an- 
nealing algorithm to find the minimal number N E (£) of boxes of size £ that 
cover all the edges of a given network. We find that there is a power law 
dependence of N E (£) upon £ with exponent —D E . In order to extract the 
best minimally biased estimate of D E , following the recommendations of the 
previous section, we first prune N E (£) by using Eq. (15). We then perform a 
logarithmically evenly spaced sampling on 300 values. Then, we detrend N E (£) 
by the power law £~° E to obtain <f>(£), as shown in Fig. 6b. Fourier transform 
is adopted to obtain the power spectrum of log 2 (0) with log 2 (£) being the 
independent variable, as shown in Fig. 6c. A clear peak at about / = 1 as well 
as many remarkable harmonic peaks at fi = if are observed for all the cellular 
networks. For several networks, the peak at / ~ 1 is not the highest peak. 
However, the harmonic peaks give strong evidence that / ~ 1 is the fundamen- 
tal log-frequency. In addition, the high peaks at / « 0.25 are nothing but a 
Nyquist frequency (in log-scale) corresponding to the lower frequency approx- 
imately equal to half the inverse of the whole interval (in log-scale). The total 
span of the data is about log 2 18 and its low-frequency is l/log 2 18 ~ 0.24, 
known as the most probable log- frequency [26]. The averaged log-frequency 
over all networks is thus / = 0.97 ± 0.05 and the average preferred scaling 
ratio if A = e log ( 2 ^ = 2.05 ± 0.07. We also plot fi versus i to achieve a better 
estimate of the fundamental log-frequency, as shown in Fig. 6d. We see that 
all plots show excellent linearity. This gives an averaged log-frequency over all 
networks is thus / = 0.996 ± 0.004 and the average preferred scaling ratio if 
A = e l °zWf = 2.005 ± 0.005. 



12 





log 2 (0 



log 2 (/) 




Fig. 6. Log-periodicity in the Aquifex aeolicus network, a, The dependence of Ne(£) 
as a function of I. b, The dependence of ^ as a function of I. The log-periodic 
oscillations are very clear, c, Power spectrum of log 2 [</>(log 2 (•£))]• The log- frequency 
is fi = 0.9413 and the preferred scaling ratio is A = 2.0884. The probability of 
false alarm is 7.2%. d, The dependence of fi as a function of i. The log-frequency 
is / = 0.9990 and the preferred scaling ratio is A = 2.0014. 



We now present two additional methods to obtain the averaged log-frequency. 



The first method consists in a canonical analysis of the log-periodic oscillations 
in which we perform averaging over all individual power spectra to get an 
averaged spectrum. The idea was initially introduced in the analysis of log- 
periodicity in two-dimensional turbulence [27]. Figure 7 shows the averaged 
power spectrum over 43 cellular networks. The highest peak is located at 
fx = 0.965 (thus A = 2.051). Again we see a number of harmonic peaks at 
fi = if. Linear regression of /, against i, shown in the inset of Fig. 7, gives a 
nice line with slope / = 1.0001 (thus A = 1.9998), which is the fundamental 
log-frequency. 



The second method is to average Ne(£) for each I over the 43 networks, which 
is then analyzed as if it is from an individual network. The highest peak 
is located at f\ = 0.989 (thus A = 2.015). Again we observe a number of 
harmonic peaks at fi = if. Linear regression of f\ against i, shown in the 
inset of Fig. 7, gives a nice line with slope / = 0.999 (thus A = 2.002), which 
is the fundamental log- frequency. 
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Fig. 7. Power spectrum obtained by canonical averaging over all power spectra for 
43 cellular networks. Inset: Estimation of the fundamental log-frequency. 

3.3 The significance of log-periodicity 



There are well established methods to assess the statistical significance of log- 
periodic oscillations according to the data points used in the analysis and the 
Lomb peak height 



Here, in order to test for the possibility that the log-periodic oscillations could 
be artifactual, we adopt the following bootstrapping approach. For each cel- 
lular network, we obtain the minimal number of boxes needed to cover all 
the edges of the network using simulated annealing. Due to the self-similarity 
of the network, we can calculate the residuals <p according to Eq. (14). Then 
we reshuffle the residuals series randomly to get a new reshuffled residuals 
series 0' and then put it back to construct a new series of edge-covering box 
numbers N' E = <J)'£~ De . Then we use the same procedure described in Sec. 3.2 
to construct the power spectrum. We extract the maximum height P m of the 
part with log-frequency between [0.9, 1.1]. This procedure is repeated for 1000 
times, which gives 1000 values of P m . For the real cellular network, we have 
its counterpart height P/y. Finally, the probability of a false alarm for log- 
periodicity (so-called "false positive" or error of type II) is defined by 

= #[P m > p n ] (16) 
1000 ' v ; 

where j^\P m > Pn] counts the number of networks whose Lomb peak height 
is larger than P/y 

The resulting probabilities of a false alarm for each of the 43 cellular networks 
are ordered by increasing values: 0.0040; 0.0060; 0.0080; 0.0090; 0.0120; 0.0140; 
0.0160; 0.0160; 0.0170; 0.0170; 0.0180; 0.0210; 0.0220; 0.0220; 0.0270; 0.0300; 
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0.0330; 0.0390; 0.0430; 0.0510; 0.0510; 0.0580; 0.0620; 0.0700; 0.0710; 0.0720; 
0.0730; 0.0770; 0.0790; 0.0830; 0.0840; 0.0880; 0.0910; 0.1010; 0.1110; 0.1120; 
0.1230; 0.1500; 0.1630; 0.1640; 0.1660; 0.2250; 0.2360. We find that 19 out of 
the 43 cellular networks have significant log-periodicity above the confidence 
level of 95% (Pr < 0.05) and 33 out of the 43 cellular networks have significant 
log-periodicity at the confidence level of 90% (Pr < 0.10). However, there are 
10 cases that the null hypothesis that the log-periodicity is an artifact can not 
be rejected even at the significance level of 10%. 



3.4 A simple artifactual contribution to the observed log-periodicity 

The reported probabilities of a false positive for log-periodicity are not very 
small, which cast doubts on whether log-periodicity really exists. Another cu- 
rious fact is the closeness of the measured scaling factor A to the number 2. 
This last fact suggests an artifact. Actually, there is a simple mechanism to 
produce peaks at A ~ 2 in finite networks (the effect disappears eventually 
in the log-periodic spectral analysis for infinitely large networks). The mech- 
anism is based on the fact that the edges are discrete and i is an integer. 
Since I takes integer values, the two smallest values I = 1 and i = 2 allow 
to form the ratio 2. Then, the next two values I = 3 and 4 gives the ratios 
3/2 and 4/2 = 2. Combined with the two first values I = 1,2, the "harmonic" 
4/1 appear. We thus see that two approximate log-periodic oscillations appear 
when Ne{£) is sampled at intermediate real values (we use typically 300 val- 
ues of geometrically spaced €s over a range of scale roughly 18), just from the 
existence of discreteness in the first four values of I. This effect can be approx- 
imately observed in Fig. 6b: the two first oscillations are well-formed up to 
log 2 (£) = 2, i.e., for t up to the value 4 as just described. In other words, any 
power law function which is sampled on integers will exhibit some observable 
log-periodicity with two to three approximate oscillations, just as a result of 
discretization for the first few integer values of the scale. This idea is confirmed 
by our synthetic bootstrap tests which, in complete absence of log-periodicity 
(which is destroyed by the random reshuffling), nevertheless produce peaks 
P m in the log-periodic spectral analysis which are often comparable to those 
observed in the 43 networks, hence the relatively large values of Pr (defined 
in Eq. (16)) obtained in the previous section. 

We conclude this rather pessimistic argument (with respect to the existence 
of genuine log-periodicity) by two positive notes. 

• First, notwithstanding this artificial effect, one cannot deny that the real 
cellular networks have often higher Lomb peaks for log-periodicity than 
the reshuffled ones, suggesting other sources of discrete scale invariance, 
perhaps genuine. If this is the case, the value A ~ 2 remains to be explained. 
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The simplest argument could be that the dichotomous discrete hierarchy is 
perhaps the most natural and most robust discrete hierarchy that can be 
found. 

• Second, as we have shown in section 2, taking into account the presence 
of log-periodicity, spurious or not, to sample N E (£) offers a priori a better 
less-biased estimator for the fractal dimension D E - 



3. 5 Fractal dimensions of cellular networks 



We revisit in this section the estimation of fractal dimensions of the 43 cellular 
networks using four different methods described below. Note that all these 
methods are based on the edge-covering method. 

• Method 1. We have shown in Sec. 3.2 that there are significant log-periodic 
oscillations in all the 43 cellular networks with a universal preferred scaling 
ratio A = 2. We can the use the data points at i = 2 k : N E (1), N E (2), A^(4), 
• • • , shown in Fig. 8 as stars. The inverse slope of ln[iV£:(2 fc )] against ln(£fc) = 
k In 2 is an estimate of the fractal dimension. This method is illustrated in 
Fig. 8. 

• Method 2. This method uses a logarithmic sampling using Eq. (15). For 
each network, 300 evenly spaced data points in the logarithmic abscissa are 
used and its inverse slope of the data in log-log plot is an estimate of the 
fractal dimension. 

• Method 3. This method uses all the data point at i = 1, 2, 3, ■ • ■ , shown as 
circles in Fig. 8, and fit it in log-log plot. The inverse of the slope is taken 
as the estimate of the fractal dimension. This method is also illustrated in 
Fig. 8. 

• Method 4. This method is the same as Method 3 except that this method 
plot ln[iV£;(£)] versus ln(£ + 1), as used in [9]. 

We calculated the fractal dimensions of the 43 cellular networks using these 
four different methods. The results are shown in Fig. 9. The averages of the 
estimated fractal dimensions are D E = 2.68 ± 0.15 (method 1), D E = 2.95 ± 
0.15 (method 2), D E = 2.84±0.19 (method 3), and D E = 3.47±0.27 (method 
4), respectively. It is interesting to note that D E = 3.54 ± 0.27 for Method 4 
is identical to Dy = 3.5 estimated using node-covering [9]. We argue that our 
Method 1 based on the log-periodic sampling gives a better and more robust 
estimation of the fractal dimension, due to a minimal bias. We believe that our 
method of edge-covering with simulated annealing and log-periodic sampling 
minimizes the significant bias in the log-log regression. 
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Fig. 8. Determination of the fractal dimensions. The solid line is a fit to the data 
points at i = 2 k using Method 1. The dashed line is the fit to all data points with 
Method 3. The open circles are all the calculated values of Ne for all integer £'s 
while the stars are sampled geometrically at £k = 2 fc . 
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Fig. 9. Comparison of the fractal dimensions of the 43 cellular networks estimated 
using four different methods. 

4 Conclusion 



In summary, we have proposed a new box-counting method for networks, in 
which boxes are tiled to cover all the edges of the network. The minimal 
number Ne of boxes of size £ in the edge-covering method is obtained by a 
simulated annealing algorithm. We have performed detailed analytical and nu- 
merical analysis of the Sierpinski network. The results show that the Sierpinski 
network is strictly self-similar only when using the edges (and not the nodes). 
We have shown that the node-covering method gives a stronger downward bias 
estimate of the fractal dimension of the Sierpinski network. 



17 



In addition, we have shown that we can use the discrete scale invariance of the 
Sierpinski network to characterize the log-periodic oscillations in Ne{£) and 
develop a method which removes completely the bias in the estimation of the 
fractal dimension De- Taking into account the presence of log-periodicity to 
adapt the sampling of the function Ne{£) allows us to design a better estimator 
for the fractal dimension of self-similar networks. 

We have applied this improved method to 43 cellular networks previously 
studied in the literature. We found that Ne scales with respect to £ as a 
power law Ne ~ £~ De with the fractal dimension De = 2.67 ± 0.15. Some 
of the cellular networks exhibit log-periodic oscillations in Ne- However, a 
bootstrapping statistical test shows that the existence of log-periodicity in 
cellular networks is not fully conclusive. 
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